35 research outputs found

    Advanced Representation Learning for Dense Prediction Tasks in Medical Image Analysis

    Get PDF
    Machine learning is a rapidly growing field of artificial intelligence that allows computers to learn and make predictions using human labels. However, traditional machine learning methods have many drawbacks, such as being time-consuming, inefficient, task-specific biased, and requiring a large amount of domain knowledge. A subfield of machine learning, representation learning, focuses on learning meaningful and useful features or representations from input data. It aims to automatically learn relevant features from raw data, saving time, increasing efficiency and generalization, and reducing reliance on expert knowledge. Recently, deep learning has further accelerated the development of representation learning. It leverages deep architectures to extract complex and abstract representations, resulting in significant outperformance in many areas. In the field of computer vision, deep learning has made remarkable progress, particularly in high-level and real-world computer vision tasks. Since deep learning methods do not require handcrafted features and have the ability to understand complex visual information, they facilitate researchers to design automated systems that make accurate diagnoses and interpretations, especially in the field of medical image analysis. Deep learning has achieved state-of-the-art performance in many medical image analysis tasks, such as medical image regression/classification, generation and segmentation tasks. Compared to regression/classification tasks, medical image generation and segmentation tasks are more complex dense prediction tasks that understand semantic representations and generate pixel-level predictions. This thesis focuses on designing representation learning methods to improve the performance of dense prediction tasks in the field of medical image analysis. With advances in imaging technology, more complex medical images become available for use in this field. In contrast to traditional machine learning algorithms, current deep learning-based representation learning methods provide an end-to-end approach to automatically extract representations without the need for manual feature engineering from the complex data. In the field of medical image analysis, there are three unique challenges requiring the design of advanced representation learning architectures, \ie, limited labeled medical images, overfitting with limited data, and lack of interpretability. To address these challenges, we aim to design robust representation learning architectures for the two main directions of dense prediction tasks, namely medical image generation and segmentation. For medical image generation, the specific topic that we focus on is chromosome straightening. This task involves generating a straightened chromosome image from a curved chromosome input. In addition, the challenges of this task include insufficient training images and corresponding ground truth, as well as the non-rigid nature of chromosomes, leading to distorted details and shapes after straightening. We first propose a study for the chromosome straightening task. We introduce a novel framework using image-to-image translation and demonstrate its efficacy and robustness in generating straightened chromosomes. The framework addresses the challenges of limited training data and outperforms existing studies. We then present a subsequent study to address the limitations of our previous framework, resulting in new state-of-the-art performance and better interpretability and generalization capability. We propose a new robust chromosome straightening framework, named Vit-Patch GAN, which instead learns the motion representation of chromosomes for straightening while retaining more details of shape and banding patterns. For medical image segmentation, we focus on the fovea localization task, which is transferred from localization to small region segmentation. Accurate segmentation of the fovea region is crucial for monitoring and analyzing retinal diseases to prevent irreversible vision loss. This task also requires the incorporation of global features to effectively identify the fovea region and overcome hard cases associated with retinal diseases and non-standard fovea locations. We first propose a novel two-branch architecture, Bilateral-ViT, for fovea localization in retina image segmentation. This vision-transformer-based architecture incorporates global image context and blood vessel structure. It surpasses existing methods and achieves state-of-the-art results on two public datasets. We then propose a subsequent method to further improve the performance of fovea localization. We design a novel dual-stream deep learning architecture called Bilateral-Fuser. In contrast to our previous Bilateral-ViT, Bilateral-Fuser globally incorporates long-range connections from multiple cues, including fundus and vessel distribution. Moreover, with the newly designed Bilateral Token Incorporation module, Bilateral-Fuser learns anatomical-aware tokens, significantly reducing computational costs while achieving new state-of-the-art performance. Our comprehensive experiments also demonstrate that Bilateral-Fuser achieves better accuracy and robustness on both normal and diseased retina images, with excellent generalization capability

    Bilateral-Fuser: A Novel Multi-cue Fusion Architecture with Anatomical-aware Tokens for Fovea Localization

    Full text link
    Accurate localization of fovea is one of the primary steps in analyzing retinal diseases since it helps prevent irreversible vision loss. Although current deep learning-based methods achieve better performance than traditional methods, there still remain challenges such as utilizing anatomical landmarks insufficiently, sensitivity to diseased retinal images and various image conditions. In this paper, we propose a novel transformer-based architecture (Bilateral-Fuser) for multi-cue fusion. This architecture explicitly incorporates long-range connections and global features using retina and vessel distributions for robust fovea localization. We introduce a spatial attention mechanism in the dual-stream encoder for extracting and fusing self-learned anatomical information. This design focuses more on features distributed along blood vessels and significantly decreases computational costs by reducing token numbers. Our comprehensive experiments show that the proposed architecture achieves state-of-the-art performance on two public and one large-scale private datasets. We also present that the Bilateral-Fuser is more robust on both normal and diseased retina images and has better generalization capacity in cross-dataset experiments.Comment: This paper is prepared for IEEE TRANSACTIONS ON MEDICAL IMAGIN

    An Event-Triggered Low-Cost Tactile Perception System for Social Robot's Whole Body Interaction

    Get PDF
    The social interaction is one of the necessary skills for social robots to better integrate into human society. However, current social robots interact mainly through audio and visual means with little reliance on haptic interaction. There still exist many obstacles for social robots to interact through touch: 1) the complex manufacturing process of the tactile sensor array is the main obstacle to lowering the cost of production; 2) the haptic interaction mode is complex and diverse. There are no social robot interaction standards and data sets for tactile interactive behavior in the public domain. In view of this, our research looks into the following aspects of tactile perception system: 1) Development of low-cost tactile sensor array, including sensor principle, simulation, manufacture, front-end electronics, examination, then applied to the social robot's whole body; 2) Establishment of the tactile interactive model and an event-triggered perception model in a social interactive application for the social robot, then design preprocessing and classification algorithm. In this research, we use k-nearest neighbors, tree, support vector machine and other classification algorithms to classify touch behaviors into six different classes. In particular, the cosine k-nearest neighbors and quadratic support vector machine achieve an overall mean accuracy rate of more than 68%, with an individual accuracy rate of more than 80%. In short, our research provides new directions in achieving low-cost intelligent touch interaction for social robots in a real environment. The low-cost tactile sensor array solution and interactive models are expected to be applied to social robots on a large scale

    RC-Net: Regression Correction for End-To-End Chromosome Instance Segmentation

    Get PDF
    Precise segmentation of chromosome in the real image achieved by a microscope is significant for karyotype analysis. The segmentation of image is usually achieved by a pixel-level classification task, which considers different instances as different classes. Many instance segmentation methods predict the Intersection over Union (IoU) through the head branch to correct the classification confidence. Their effectiveness is based on the correlation between branch tasks. However, none of these methods consider the correlation between input and output in branch tasks. Herein, we propose a chromosome instance segmentation network based on regression correction. First, we adopt two head branches to predict two confidences that are more related to localization accuracy and segmentation accuracy to correct the classification confidence, which reduce the omission of predicted boxes in NMS. Furthermore, a NMS algorithm is further designed to screen the target segmentation mask with the IoU of the overlapping instance, which reduces the omission of predicted masks in NMS. Moreover, given the fact that the original IoU loss function is not sensitive to the wrong segmentation, K-IoU loss function is defined to strengthen the penalty of the wrong segmentation, which rationalizes the loss of mis-segmentation and effectively prevents wrong segmentation. Finally, an ablation experiment is designed to evaluate the effectiveness of the chromosome instance segmentation network based on regression correction, which shows that our proposed method can effectively enhance the performance in automatic chromosome segmentation tasks and provide a guarantee for end-to-end karyotype analysis

    Intention Understanding in Human-Robot Interaction Based on Visual-NLP Semantics

    Get PDF
    With the rapid development of robotic and AI technology in recent years, human-robot interaction has made great advancement, making practical social impact. Verbal commands are one of the most direct and frequently used means for human-robot interaction. Currently, such technology can enable robots to execute pre-defined tasks based on simple and direct and explicit language instructions, e.g., certain keywords must be used and detected. However, that is not the natural way for human to communicate. In this paper, we propose a novel task-based framework to enable the robot to comprehend human intentions using visual semantics information, such that the robot is able to satisfy human intentions based on natural language instructions (total three types, namely clear, vague, and feeling, are defined and tested). The proposed framework includes a language semantics module to extract the keywords despite the explicitly of the command instruction, a visual object recognition module to identify the objects in front of the robot, and a similarity computation algorithm to infer the intention based on the given task. The task is then translated into the commands for the robot accordingly. Experiments are performed and validated on a humanoid robot with a defined task: to pick the desired item out of multiple objects on the table, and hand over to one desired user out of multiple human participants. The results show that our algorithm can interact with different types of instructions, even with unseen sentence structures

    MicroRNA-212-5p Prevents Dopaminergic Neuron Death by Inhibiting SIRT2 in MPTP-Induced Mouse Model of Parkinson’s Disease

    Get PDF
    Recently, emerging evidences show that sirtuins (SIRTs) modulate aging progress and affect neurodegenerative diseases. For example, inhibition of SIRT2 has been recognized to exert neuroprotective effects in Parkinson’s disease (PD). However, current SIRT2 inhibitors are lack of selective property distinguished from its homolog. In this study, we found that SIRT2 protein level was highly increased in PD model, which was negatively regulated by miR-212-5p. In detail, miR-212-5p transfection reduced SIRT2 expression and inhibited SIRT2 activity. In vivo study, miR-212-5p treatment prevented dopaminergic neuron loss and DAT reduction by targeting SIRT2, which means miR-212-5p shows neuroprotective effect in PD. Mechanismly, we found nuclear acetylated p53 was up-regulation according to p53 is a major deacetylation substrate of SIRT2. Furthermore, decreased cytoplasmic p53 promoted autophagy in PD model, which was showed as autophagosomes, autophagic flux, LC3 B and p62 expression. Meanwhile, we also found miR-212-5p treatment somehow alleviated apoptosis in PD model, which might have some underlying mechanisms. In conclusions, our study provides a direct link between miR-212-5p and SIRT2-mediated p53-dependent programmed cell death in the pathogenesis of PD. These findings will give us an insight into the development of highly specifically SIRT2 inhibitor of opening up novel therapeutic avenues for PD

    Plin4-Dependent Lipid Droplets Hamper Neuronal Mitophagy in the MPTP/p-Induced Mouse Model of Parkinson’s Disease

    Get PDF
    Epidemiological studies have shown that both lipid metabolism disorder and mitochondrial dysfunction are correlated with the pathogenesis of neurodegenerative diseases (NDDs), including Parkinson’s disease (PD). Emerging evidence suggests that deposition of intracellular lipid droplets (LDs) participates in lipotoxicity and precedes neurodegeneration. Perilipin family members were recognized to facilitate LD movement and cellular signaling interactions. However, the direct interaction between Perilipin-regulated LD deposition and mitochondrial dysfunction in dopaminergic (DA) neurons remains obscure. Here, we demonstrate a novel type of lipid dysregulation involved in PD progression as evidenced by upregulated expression of Plin4 (a coating protein and regulator of LDs), and increased intracellular LD deposition that correlated with the loss of TH-ir (Tyrosine hydroxylase-immunoreactive) neurons in the MPTP/p-induced PD model mouse mesencephalon. Further, in vitro experiments showed that inhibition of LD storage by downregulating Plin4 promoted survival of SH-SY5Y cells. Mechanistically, reduced LD storage restored autophagy, leading to alleviation of mitochondrial damage, which in turn promoted cell survival. Moreover, the parkin-poly-Ub-p62 pathway was involved in this Plin4/LD-induced inhibition of mitophagy. These findings were further confirmed in primary cultures of DA-nergic neurons, in which autophagy inhibitor treatment significantly countermanded the ameliorations conferred by Plin4 silencing. Collectively, these experiments demonstrate that a dysfunctional Plin4/LD/mitophagy axis is involved in PD pathology and suggest Plin4-LDs as a potential biomarker as well as therapeutic strategy for PD

    Stepwise Feature Fusion: Local Guides Global

    Get PDF
    Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it is easy for existing deep learning models to overfit the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new state-of-the-art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves state-of-the-art performance in both learning and generalization assessment

    Minimizing the programming power of phase change memory by using graphene nanoribbon edge-contact

    Full text link
    Nonvolatile phase change random access memory (PCRAM) is regarded as one of promising candidates for emerging mass storage in the era of Big Data. However, relatively high programming energy hurdles the further reduction of power consumption in PCRAM. Utilizing narrow edge-contact of graphene can effectively reduce the active volume of phase change material in each cell, and therefore realize low-power operation. Here, we demonstrate that a write energy can be reduced to about ~53.7 fJ in a cell with ~3 nm-wide graphene nanoribbon (GNR) as edge-contact, whose cross-sectional area is only ~1 nm2. It is found that the cycle endurance exhibits an obvious dependence on the bias polarity in the cell with structure asymmetry. If a positive bias was applied to graphene electrode, the endurance can be extended at least one order longer than the case with reversal of polarity. The work represents a great technological advance for the low power PCRAM and could benefit for in-memory computing in future.Comment: 14 pages, 4 figure
    corecore